Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🔮 Predictive power
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
15549
posts in
53.0
ms
Pure C implementation of the
TurboQuant
paper (
ICLR
2026) for KV cache compression in LLM inference.
🦙
Ollama
github.com
·
1d
·
r/LocalLLaMA
·
…
Bloom
's 2
Sigma
Problem
💡
New and interesting problems
rochan.bearblog.dev
·
3d
·
…
Activating
Two Trap Cards at Once, or: A Gentle Response to the Popularity of
Vibecoding
🏛️
NetHack
gist.github.com
·
6h
·
Lobsters
·
…
Qwen 3.5 9B LLM
GGUF
quantized
for local structured extraction
🦙
Ollama
huggingface.co
·
1d
·
r/LocalLLaMA
·
…
Boyer-Moore
Majority
Element
⚖
optimizing for consensus
NULL BITMAP by Justin Jaffray via buttondown.com
·
3d
·
…
The Results Are In!
🏆
Monumental success
justemile.bearblog.dev
·
2d
·
…
Chinese Named
Entity
Recognition Model Selection for Small Pure CPU
Environments
🦙
Ollama
lawtee.com
·
2d
·
…
Demystifying
Softmax
Loss: A Step-by-Step
Derivation
for Linear Classifiers
🧮
Vector Databases
blog.aeilot.top
·
5d
·
…
Aurora
🦙
Ollama
together.ai
·
2d
·
Hacker News
·
…
Make Your
Prompts
Boring
🦙
Ollama
jigarkdoshi.bearblog.dev
·
4d
·
…
Scaling
Beyond
4k
🧮
Vector Databases
rishirajacharya.com
·
2d
·
…
What if AI doesn’t need more
RAM
but better
math
?
🦙
Ollama
adlrocha.substack.com
·
4d
·
Substack
·
…
Learning by Building, Part 3:
Caltrain
Bot
, Local LLMs, and Reality
🦙
Ollama
dima.us.kg
·
3d
·
…
Inference
Engines
— A visual deep dive into the journey of a token down the transformer
layers
🧮
Vector Databases
femiadeniran.com
·
4d
·
r/LocalLLaMA
·
…
Bigoish
: Test the
empirical
computational complexity of Rust algorithms
🧮
Vector Databases
docs.rs
·
6d
·
Lobsters
,
Hacker News
·
…
Standard
LoRA
is quietly losing 68% of quality on
FP8
hardware and most people have no idea
🦙
Ollama
koscak.ai
·
6d
·
r/LocalLLaMA
·
…
Some Local
Aspects
of AI
🦙
Ollama
blog.raymond.burkholder.net
·
4d
·
…
yashkc2025/turboquant
: Python implementation of
TurboQuant
(arXiv 2504.19874). Data-oblivious, near-optimal 1–4 bit vector quantization for streaming KV-caches and databases.
🧮
Vector Databases
github.com
·
4d
·
r/LocalLLaMA
·
…
Fused
INT8
Weight-Only Quantization in
Pallas
🧮
Vector Databases
rishirajacharya.com
·
3d
·
…
TurboQuant
for weights: near‑optimal 4‑bit LLM quantization with
lossless
8‑bit residual
🦙
Ollama
github.com
·
6d
·
r/LocalLLaMA
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help